Modélisation de contextes pour l'annotation sémantique de vidéos. (Context based modeling for video semantic annotation)

نویسنده

Nicolas Ballas

چکیده

Recent years have witnessed an explosion of multimedia contents available. In 2010 the video sharing website YouTube announced that 35 hours of videos were uploaded on its site every minute, whereas in 2008 users were “only” uploading 12 hours of video per minute. Due to the growth of data volumes, human analysis of each video is no longer a solution; there is a need to develop automated video analysis systems. This thesis proposes a solution to automatically annotate video content with a textual description. The thesis core novelty is the consideration of multiple contextual information to perform the annotation. With the constant expansion of visual online collections, automatic video annotation has become a major problem in computer vision. It consists in detecting various objects (human, car. . . ), dynamic actions (running, driving. . . ) and scenes characteristics (indoor, outdoor. . . ) in unconstrained videos. Progress in this domain would impact a wild range of applications including video search, video intelligent surveillance or human-computer interaction. Although some improvements have been shown in concept annotation, it still remains an unsolved problem, notably because of the semantic gap. The semantic gap is defined as the lack of correspondences between video features and high-level human understanding. This gap is principally due to the concepts intra-variability caused by photometry change, objects deformation, objects motion, camera motion or viewpoint change. . . To tackle the semantic gap, we enrich the description of a video with multiple contextual information. Context is defined as “the set of circumstances in which an event occurs”. Video appearance, motion or space-time distribution can be considered as contextual clues associated to a concept. We state that one context is not informative enough to discriminate a concept in a video. However, by considering several contexts at the same time, we can address the semantic gap. More precisely the thesis major contributions are the following: • a novel framework that takes into consideration several contextual information: To benefit from mutiple contextual clues, we introduce a fusion scheme based on a generalize sparsity criteria. This fusion model automatically infers the set of relevent contexts for a given concept. • a feature inter-dependences context modeling: Different features capture complementary information. For instance, Histogram of Gradient (HoG) focuses on the video appearance while the Histogram of Flow (HoF) collects motion information. Most of the existing works capture different feature statistics independently. By contrast, we leverage their covariance to refine our video signature. 3 • a concept-dependent modeling of space-time context: Discriminative information is not equally distributed in the video space-time domain. To identify the discriminative regions, we introduce a learning algorithm that determines the space-time shape associated to each individual concept. • an attention context modeling: We enrich video signatures with biologicalinspired attention maps. Such maps allow to capture space-time contextual information while preserving the video signature invariance to the translation, rotation and scaling transformations. Without this space-time invariance, different concept instances with various localizations in the space-time volume can result in divergent representations. This problem is severe for the dynamic actions which have dramatic space-time variability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Du texte à la connaissance : annotation sémantique et peuplement d'ontologie appliqués à des artefacts logiciels

Résumé : Les applications logicielles possèdent généralement une courbe d'apprentissage considérable pour les nouveaux développeurs et pour ceux qui souhaitent en intégrer des parties dans leurs propres applications. L'attrait d'utiliser ici une technologie à base de sémantique repose sur son potentiel à associer un réseau de connaissance aux artefacts logiciels existants, structurés ou non. Ce...

متن کامل

Profil générique sémantique pour ladaptation de documents multimédias

RÉSUMÉ. Actuellement, les documents multimédias peuvent être consultés à tout moment et n’importe où sur une grande variété de dispositifs mobiles. L’hétérogénéité de ces plateformes, les préférences utilisateurs mais également le contexte de consultation impose des adaptations de documents à certaines contraintes, comme par exemple, ne pas jouer de contenus audio lorsque l’utilisateur particip...

متن کامل

Construction de hiérarchies sémantiques pour l'annotation d'images

This paper proposes a new methodology to automatically build semantic hierarchies suitable for image annotation and classification. The building of the hierarchy is based on a new measure of semantic similarity. The proposed measure incorporates several sources of information : visual, conceptual and contextual as we defined in this paper. The aim is to provide a measure that best represents im...

متن کامل

Ontologies étendues pour l'annotation sémantique

Résumé : Cet article tente de formaliser le processus consistant à annoter sémantiquement un texte au regard d’une ontologie. L’annotation sémantique met des fragments de texte en correspondance avec les éléments d’une ontologie, mais toute la difficulté consiste à identifier les fragments à annoter et les étiquettes à leur associer. Nous proposons d’étendre les ontologies par des règles d’anno...

متن کامل

Using distributed word representations for robust semantic role labeling (Utilisation de représentations de mots pour l'étiquetage de rôles sémantiques suivant FrameNet) [in French]

Résumé. D’après la sémantique des cadres de Fillmore, les mots prennent leur sens par rapport au contexte événementiel ou situationnel dans lequel ils s’inscrivent. FrameNet, une ressource lexicale pour l’anglais, définit environ 1000 cadres conceptuels couvrant l’essentiel des contextes possibles. Dans un cadre conceptuel, un prédicat appelle des arguments pour remplir les différents rôles sém...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Modélisation de contextes pour l'annotation sémantique de vidéos. (Context based modeling for video semantic annotation)

نویسنده

چکیده

منابع مشابه

Du texte à la connaissance : annotation sémantique et peuplement d'ontologie appliqués à des artefacts logiciels

Profil générique sémantique pour ladaptation de documents multimédias

Construction de hiérarchies sémantiques pour l'annotation d'images

Ontologies étendues pour l'annotation sémantique

Using distributed word representations for robust semantic role labeling (Utilisation de représentations de mots pour l'étiquetage de rôles sémantiques suivant FrameNet) [in French]

عنوان ژورنال:

اشتراک گذاری

Profil générique sémantique pour ladaptation de documents multimédias